Skip to content

feat: weekly TechAPI refresh pipeline + benchmark enrichment#4

Merged
Seungpyo1007 merged 4 commits into
mainfrom
feat/weekly-refresh-pipeline
Jun 1, 2026
Merged

feat: weekly TechAPI refresh pipeline + benchmark enrichment#4
Seungpyo1007 merged 4 commits into
mainfrom
feat/weekly-refresh-pipeline

Conversation

@Seungpyo1007
Copy link
Copy Markdown
Member

개요

주 1회 자동으로 라이브 벤치마크 수집 → 전체 무결성 검증 → 정적 덤프 생성 → 공개 TechAPI 저장소에 날짜 브랜치 + 자동 PR을 거는 파이프라인.

포함 커밋

  • feat(ingest): 멀티소스 벤치마크 enrichment 러너 + 스크레이퍼 9종 (PassMark, technical.city, cgdirector, notebookcheck, SPEC CPU2006, topcpu.net, Blender, videocardbenchmark). null 컬럼만 채우고 절대 덮어쓰지 않음. CPU/GPU 모델 벤치 필드 확장 + 네트워크 없는 단위 테스트.
  • feat(ci): .github/workflows/weekly-refresh.yml — 월요일 06:00 UTC cron + 수동 실행. 12개 소스 수집 → app.validate + integrity_check.py --strict 게이트 → app.dumppeter-evans/create-pull-requestrefresh/<날짜> PR. integrity_check.py에 하드 이상치(중복 슬러그·슬러그≠파일명·single>multi)만 차단하는 --strict 모드 추가.
  • chore: TechAPI를 서브모듈(gitlink, main 추적)로 연결 — 브라우징/링크 전용.

토큰 동작

크로스 레포 PR 단계는 secrets.TECHAPI_TOKEN으로 가드. 토큰 없으면 수집·검증·덤프 후 아티팩트 업로드까지만 동작하고 PR만 스킵.

검증

  • ingest 소스 9종 mypy --strict 통과, 네트워크 없는 단위 테스트 통과
  • integrity_check.py --strict 현재 데이터로 통과 확인

Add a variant-safe enrichment runner (app/ingest/enrich.py) that fills null benchmark columns on existing TechAPI CPU/GPU records without ever overwriting, writing only on exact heading matches. Backed by per-source scrapers (PassMark, technical.city, cgdirector, notebookcheck, SPEC CPU2006, topcpu.net, Blender, videocardbenchmark) registered in a SOURCES table.

Extend the CPU/GPU models with legacy + cross-aggregator benchmark fields, add network-free unit tests for the source parsers, and wire a cpu-only enrich step into weekly-ingest.
Add .github/workflows/weekly-refresh.yml: a Monday cron (and manual dispatch) that live-scrapes every CPU/GPU benchmark source into a TechAPI checkout, gates the full dataset on app.validate plus a strict integrity_check, regenerates the static v1 dump and openapi.json into site/public, and opens a dated refresh/<date> PR via peter-evans/create-pull-request.

The cross-repo PR step is guarded by secrets.TECHAPI_TOKEN; without it the job still collects, validates, dumps, and uploads artifacts. Add a --strict mode to integrity_check.py that exits non-zero on hard anomalies (duplicate slugs, slug/file mismatch, single>multi) while keeping statistical outliers advisory.
Pin the public TechAPI repo as a submodule tracking main, mirroring TechAPI's link back to TechEngine. Browsing/link only — the weekly-refresh workflow uses a separate token-authenticated checkout for writes.
Sort the import block and wrap an over-long assert in test_gpu_sources.py so 'ruff check app tests' passes in CI.
@Seungpyo1007 Seungpyo1007 merged commit e92aad8 into main Jun 1, 2026
1 check passed
@Seungpyo1007 Seungpyo1007 deleted the feat/weekly-refresh-pipeline branch June 1, 2026 17:58
@Seungpyo1007 Seungpyo1007 restored the feat/weekly-refresh-pipeline branch June 1, 2026 18:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant